How Knowledge Graphs Bring Order to the HRA’s Data Diversity
It takes a lot of data to construct the Human Reference Atlas (HRA).
But not all data is created equal!
HRA data comes from many different sources who may use different technologies and
follow different protocols.
The data itself comes in many different formats, some of which may require a
particular code to read.
Some of it is old data, and some of it is new.
It may have been mixed with other data or repurposed.
Some data might be open research data, available to all.
While some data might have restrictions that limit access, use, and distribution.
For the Human Reference Atlas (HRA), we need the ability to easily find the data we
want, utilize it for our purposes, and share it as widely as possible.
Of course, that data needs to be structured in a way that it can be readable by
machines.
Ideally, though, that data structure would also be understandable to humans.
It would not only show what data exists in the HRA but also how pieces of that data
relate to each other.
By labeling our data and connecting our labeled nodes with relational
links,
we put our data into context and create a framework for moving from data
to knowledge
to insight.
The type of data structure we are moving towards here is known as a “knowledge
graph,” and they are a lot more common than you think.
Google was the first to introduce the term back in 2012.
But now major companies like
Facebook, Amazon, and Netflix—all utilize knowledge graphs to represent relationships between people,
products, and concepts.
A knowledge graph gathers all the things that are important to a particular group or organization.
These things can be people, places, entities, concepts, databases, documents—really just about anything.
Each of those data entities is assigned a node. Then, it organizes all
those things into a network of interrelations.
In the case of the Human Reference Atlas, we are interested in things like biological data, research metadata, and data about the digital objects within the HRA.
Using the Resource Description Framework (RDF), each of these are expressed as
a subject, predicate, and an object.
The predicate expresses the relationship between the entities.
This grouping is called a triple, and the relation between an anatomical structure and
its parent organ might look like this.
Let's see how this might look for a particular digital object created for the Human
Reference Atlas.
Here's a 3D reference organ for the left female kidney.
And here's how it appears in the knowledge graph.
The subject entries in the left column all point to the same thing:
the HRA's 3D reference organ of the left female kidney.
The object column lists all the other data in the HRA that the reference organ is
connected to.
And the predicate column indicates the nature of that relationship.
A closer look at these predicates reveals relationships such as the creation date,
version number, the raw data the 3D kidney was derived from, and many more.
What we see here is actually a network of nodes and edges, with our kidney reference
organ as the central node with all its related data connected to it by labeled edges.
Of course, this is only one network. There are over 500 digital objects currently in
the HRA, each with its own network. And each network is connected to all the others.
Utilizing a knowledge graph not only helps us structure the massive amount of
different types of data that power the HRA.
It will also allow us to link up with other information
networks to create a wide and radically open web of knowledge about the human body.
External links
Want to learn how the HRA puts all that data to use? Check out this overview of some of the neat things the HRA can do!
So now you're like, "Enough talk already! Show me the data!!" Relax: you can hook up to HRA data by visiting our API page.
What? You don't know what an HRA is or why we really, really need one? Get caught up with our very first scrollytale!